Search CORE

26 research outputs found

An experiment in audio classification from compressed data

Author: Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/09/2004
Field of study

In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content. The proposed algorithm allows classification directly on MPEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data. The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed

Irish Universities

DCU Online Research Access Service

Rhythm detection for speech-music discrimination in MPEG compressed domain

Author: Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced. Rhythmic pulses are detected by applying a long-term autocorrelation method on band-passed signals. This approach is combined with another, in which the features describe the energy peaks of the signal. The discriminator uses just three features that are computed from data directly taken from an MPEG-1 bitstream. The discriminator was tested on more than 3 hours of audio data. Average recognition rate is 97.7%

Crossref

DCU Online Research Access Service

MPEG-1 bitstreams processing for audio content analysis

Author: Duffner Orla
Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/01/2002
Field of study

In this paper, we present the MPEG-1 Audio bitstreams processing work which our research group is involved in. This work is primarily based on the processing of the encoded bitstream, and the extraction of useful audio features for the purposes of analysis and browsing. In order to prepare for the discussion of these features, the MPEG-1 audio bitstream format is first described. The Application Interface Protocol (API) which we have been developing in C++ is then introduced, before completing the paper with a discussion on audio feature extraction

CiteSeerX

Irish Universities

DCU Online Research Access Service

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

Author: Adavanne Sharath
Drossos Konstantinos
Jarina Roman
Malik Miroslav
Ticha Dasa
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with the state-of-the-art method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the 'MediaEval2015 emotion in music' dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.Comment: Accepted for Sound and Music Computing (SMC 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Speech-music discrimination from MPEG-1 bitstream

Author: Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/09/2001
Field of study

This paper describes a proposed algorithm for speech/music discrimination, which works on data directly taken from MPEG encoded bitstream thus avoiding the computationally difficult decoding-encoding process. The method is based on thresholding of features derived from the modulation envelope of the frequency-limited audio signal. The discriminator is tested on more than 2 hours of audio data, which contain clean and noisy speech from several speakers and a variety of music content. The discriminator is able to work in real time and despite its simplicity, results are very promising

Irish Universities

DCU Online Research Access Service

Voice Operated Information System in Slovak

Author: Jarina Roman
Juhár Jozef
Rozinaj Gregor
Rusko Milan
Trnka Marián
Čižmár Anton
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

Speech communication interfaces (SCI) are nowadays widely used in several domains. Automated spoken language human-computer interaction can replace human-human interaction if needed. Automatic speech recognition (ASR), a key technology of SCI, has been extensively studied during the past few decades. Most of present systems are based on statistical modeling, both at the acoustic and linguistic levels. Increased attention has been paid to speech recognition in adverse conditions recently, since noise-resistance has become one of the major bottlenecks for practical use of speech recognizers. Although many techniques have been developed, many challenges still have to be overcome before the ultimate goal -- creating machines capable of communicating with humans naturally -- can be achieved. In this paper we describe the research and development of the first Slovak spoken language dialogue system. The dialogue system is based on the DARPA Communicator architecture. The proposed system consists of the Galaxy hub and telephony, automatic speech recognition, text-to-speech, backend, transport and VoiceXML dialogue management modules. The SCI enables multi-user interaction in the Slovak language. Functionality of the SLDS is demonstrated and tested via two pilot applications, ``Weather forecast for Slovakia'' and ``Timetable of Slovak Railways''. The required information is retrieved from Internet resources in multi-user mode through PSTN, ISDN, GSM and/or VoIP network

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Dublin City University video track experiments for TREC 2002

Author: Browne Paul
Czirjék Csaba
Gurrin Cathal
Jarina Roman
Lee Hyowon
Marlow Seán
McDonald Kieran
Murphy Noel
O'Connor Noel E.
Smeaton Alan F.
Ye Jiamin
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2002
Field of study

Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems

DCU Online Research Access Service

Discriminative feature selection for applause sounds detection

Author: Ján Olajec
Roman Jarina
Publication venue
Publication date: 01/01/2007
Field of study

The specific sounds such as applause, laughter, explosions, etc. are very helpful to understand high level semantic of the audio/video content. The paper focuses on feature selection by evolutional programming for automatic detection of applause in audio stream. A set of the most discriminative features is selected by Genetic Algorithm and Simulated Annealing. The experiments are run on more than 9 hours of audio selected from various audio and video content. The results show that the applause sound recognition improves if only a few coefficients are selected from MFCC static and dynamic features. Further, the delta-delta coefficients (the 2nd time derivates of MFCCs) highly outperform the delta coefficients. 1

CiteSeerX

Crossref

Development of a System for Automatic Recognition of Speech

Author: Michal Kuba
Roman Jarina
Publication venue: VSB-Technical University of Ostrava
Publication date: 01/01/2003
Field of study

The article gives a review of a research on processing and automatic recognition of speech signals (ARR) at the Department of Telecommunications of the Faculty of Electrical Engineering, University of Zilina. On-going research is oriented to speech parametrization using 2-dimensional cepstral analysis, and to an application of HMMs and neural networks for speech recognition in Slovak language. The article summarizes achieved results and outlines future orientation of our research in automatic speech recognition

Directory of Open Access Journals

DSpace at VSB Technical University of Ostrava